An Author Profiling Approach Based on Language-dependent Content and Stylometric Features
نویسندگان
چکیده
We describe the approach that we submitted to the 2015 PAN competition [5] for the author profiling task. The task consists in predicting some attributes of an author analyzing a set of his/her Twitter tweets. We consider several sets of stylometric and content features, and different decision algorithms: we use a different combination of features and decision algorithm for each language-attribute pair, hence treating it as an individual problem.
منابع مشابه
A Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملAuthor Profiling using Complementary Second Order Attributes and Stylometric Features
In this paper we present an approach for the task of author profiling. We propose a modular framework, extracting two main group of features, combined with appropriate preprocessing, implementing Support Vector Machines for classification. The two main groups we used were stylometric and discriminative, featuring trigrams on one hand and complementary-weighted Second Order Attributes on the oth...
متن کاملGrammar Checker Features for Author Identification and Author Profiling Notebook for PAN at CLEF 2013
Our work on author identification and author profiling is based on the question: Can the number and the types of grammatical errors serve as indicators for a specific author or a group of people? In order to detect the grammatical errors we base our approach on the output of the open-source library LanguageTool. In the case of the author identification we transform the problem into a statistica...
متن کاملAuthor Profiling using Stylometric and Structural Feature Groupings
In this paper we present an approach for the task of author profiling. We propose a coherent grouping of features combined with appropriate preprocessing steps for each group. The groups we used were stylometric and structural, featuring among others, trigrams and counts of twitter specific characteristics. We address gender and age prediction as a classification task and personality prediction...
متن کاملDetecting Promotional Content in Wikipedia
This paper presents an approach for detecting promotional content in Wikipedia. By incorporating stylometric features, including features based on n-gram and PCFG language models, we demonstrate improved accuracy at identifying promotional articles, compared to using only lexical information and metafeatures.
متن کامل